Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add config filter parameter to simulate_factors() #14

Merged
merged 8 commits into from
May 5, 2017

Conversation

bgmerrell
Copy link
Contributor

@bgmerrell bgmerrell commented May 1, 2017

A new optional config_filter parameter for the simulate_factors() method is
added. The config filter is a function which (when not None), is passed the list
of configs from a multi-factor simulation. Only the simulations for which the
config filter function returns true are run.

This helps with the orchestrating the execution of multiple, per-factor,
simulations on separate machines (e.g., in the cloud) with a single command
(with multiple factors). For example, snowbird could pass the following
config_filter argument to simulate_factors():

def only_factor_filter_fn(cfg):
        return cfg['meta.sim.index'] == args.only_factor

In the case that args.only_factor = 1, only the simulation at factor index 1
would be executed, and the user would see the following artifacts (assuming
a workspace of ~/locker/test):

~locker/test/1/config.yaml
~locker/test/1/result.yaml
~locker/test/1/sim.log

Add a new optional `only_factor` parameter for the simulate_factors()
method.

The only_factor parameter is specified, only a single simulation will be
run. The simulation that runs is for the factor at the index specified
by the value of the only_factor argument.

For example, if the snowbird command-line arguments look like this:

	meta.cmd_line: sb -f host.queue_depth 4,16 -f fm.fim.xfer_rate_MBps 400,800 -f fm.ase.t_wait_rd 4,5,6 -s duration 10 ms --only-factor 4

Then snowbird will pass in only_factor=4 as the argument to desmod and
the meta.sim.index and meta.sim.workspace will be set to 4 and
workspace/4 respectively.

Additionally, meta.sim.special will look like this:

    meta.sim.special:
    - [host.queue_depth, 4]
    - [fm.fim.xfer_rate_MBps, 800]
    - [fm.ase.t_wait_rd, 5]

That is, only a single simulation, for the factor indicated by
meta.sim.special, was run.

One use for this new feature is orchestrating the execution of multiple,
per-factor, simulations on separate machines (e.g., in the cloud) with a
single command (with multiple factors).
Some new config options have been added:
 - sim.workspace.s3_sync: When true, syncs any workspaces to s3
 - sim.workspace.s3_bucket: The s3 bucket to sync to
 - sim.workspace.s3_sync_prefix: A string to act as the root path in the
   s3 bucket.  By default, this is set the user's username.

Here's an example of running snowbird with these new options:

$ sb -s sim.workspace ~/locker/test -s sim.workspace.s3_sync_prefix
bgmerrell -n test-rd-rnd -s sim.duration 10ms -s sim.workspace.s3_sync
True -f host.queue_depth 4,16

This would result in the following being synced to the default bucket in
s3:

/bgmerrell/test/0/config.yaml
/bgmerrell/test/0/result.yaml
/bgmerrell/test/0/sim.log
/bgmerrell/test/1/config.yaml
/bgmerrell/test/1/result.yaml
/bgmerrell/test/1/sim.log

The method to do the sync work is called when the _Workspace context
manager is exiting.

The uploading to s3 is done using the concurrent.features package.  This
package isn't available in the Python 2 library, so the requirements.txt
file has been updated to install the package for Python 2 only.
@bgmerrell bgmerrell closed this May 1, 2017
@bgmerrell bgmerrell reopened this May 1, 2017
@coveralls
Copy link

coveralls commented May 1, 2017

Coverage Status

Coverage decreased (-1.1%) to 94.747% when pulling 006567d on bgmerrell:master into 84edba4 on SanDisk-Open-Source:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage decreased (-1.1%) to 94.747% when pulling 006567d on bgmerrell:master into 84edba4 on SanDisk-Open-Source:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-1.1%) to 94.747% when pulling 006567d on bgmerrell:master into 84edba4 on SanDisk-Open-Source:master.

@bgmerrell bgmerrell closed this May 1, 2017
@bgmerrell bgmerrell reopened this May 1, 2017
@coveralls
Copy link

coveralls commented May 1, 2017

Coverage Status

Coverage decreased (-1.1%) to 94.747% when pulling 006567d on bgmerrell:master into 84edba4 on SanDisk-Open-Source:master.

Copy link
Collaborator

@jpgrayson jpgrayson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this capability, but there are several wrinkles to smooth out.

os.chdir(self.orig_dir)

def sync(self):
if self.config['sim.workspace.s3_sync']:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Within desmod, we want to avoid fatal effects if the user provides an under-specified config. And when the user under-specifies and desmod uses a default, we want the config dict to be updated to reflect the default that desmod used--i.e. desmod's implicit defaults become explicit.

To that end, we should use self.config.setdefault() to interrogate the config dict here.

W.r.t. the config key name, I also advocate that it be sim.workspace.s3_sync.enable. The ".enable" suffix has a lot of precedence in other config keys in the "sim" namespace. We could probably make a good case for some other naming strategies:
self.config.setdefault('sim.workspace.s3_sync.enable', False)
self.config.setdefault('sim.s3_sync.enable', False)
self.config.setdefault('sim.sync.s3.enable', False)

If we are going to have other workspace synchronization methods, then the last option creats a clean namespace.

N.B. that the two other new config items should fit into this scheme. E.g.:

sim.sync.s3.bucket
sim.sync.s3.prefix

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.config.setdefault('sim.sync.s3.enable', False)

Cool. My original approach was to use sim.workspace.sync.s3.enable, but I changed my mind because I thought it might match existing patterns better to be able to simply specify -s s3_sync True on the snowbird command-line. I'm happy to be wrong on this one.

One question: do you I should including "workspace" in sim.workspace.sync.s3.enable now that the syncing takes place in _Workspace, or is that an unnecessary coupling?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bias is to use sim.sync.s3. as the base namespace. In addition to being decoupled from sim.workspace, it also ends up being less typing and a shallower hierarchy--both positives.

@@ -0,0 +1,59 @@
from concurrent.futures import ThreadPoolExecutor

import boto3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should defer importing boto3 until we know that S3 synchronization is enabled. That allows boto3 to be an optional dependency as opposed to a required dependency. The same holds for concurrent.futures.

My bias is that most people most of the time will not be using the S3 sync feature and thus keeping the requirements list shorter outweighs the additional burden this would put on users who do want to use S3 sync.

Note that we do deferred import (non import-time import) of the progressbar library as well.

artifacts = []
for root, _, files in os.walk(os.curdir, topdown=False):
for filename in files:
artifacts.append(os.path.join(root, filename))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little curious that _Workspace has responsibility for walking the artifacts versus the S3Sync taking that on itself. Is the idea that if we have multiple synchronization backends that we could reuse the same artifacts list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally had this logic in workspacesync/s3.py and moved it into _Workspace for that exact reason you asked about. Having the sync backends using the same artifacts input is nice for consistency, code reuse, and computational cycles. That said, I'm solving a problem that doesn't exist yet, so I'm happy to move it back to s3.py for now if you think that is better.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<shrug> I think it's fine as-is.

username = getpass.getuser()
except:
username = "unknown"
config['sim.workspace.s3_sync_prefix'] = username
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is the right policy. While it seems like certain desmod users might have a policy that requires organizing synchronized artifacts by username, I don't think it is universal enough to drive a default in desmod. Instead, I think that the default prefix should simply be "/" and users should have to compose their own prefix that incorporates usernames, timestamps, or other such metadata.

import os


class S3Sync(object):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this could have been implemented as a regular function instead of as a class with a single method. I.e.

sync_to_s3(...)

versus:

S3Sync(...).sync()

I don't see it as imperative that we change this. It is not a user-facing API and we could thus revisit later if desired.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason I went with this approach to perhaps have common set of methods for multiple backend synchronizers, which could register themselves and perhaps one day be called in a loop like so:

for cls in sync_registry:
    cls(config, artifacts).sync()

That said, that's a lot of hand waving, and we only have one sync backend now, so I probably shouldn't be solving problems that don't exist yet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without using classes, each synchronization backend module could register its sync() function. Lots of ways to skin this. As long as these syncronization backends are all driven by configuration and not by user-facing API, then we retain lots of license to revisit this detail.

I do think there is at least one compelling case for having a class: extract and check the configuration before simulation.

os.path.split(self.workspace)[1],
(artifact[2:]))
self.client.upload_file(
artifact, self.config['sim.workspace.s3_sync_bucket'], dest)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should extract the required configuration prior to submitting to the thread pool. The config is user input and thus may have all manner of garbage in it.

A good case for retaining the class-oriented strategy here would be so that we can extract configuration in __init__() before simulation and then have confidence that the user provided valid configuration well before end of simulation.

class _Workspace(...):
    def __init__(self, ...):
        ...
        if config.setdefault('sim.sync.s3.enable', False):
            self.s3_sync = S3Sync(...)
        else:
            self.s3_sync = None
        ...
    def __exit__(self, ...):
        ...
        if self.s3_sync:
            self.s3_sync.sync()

class S3Sync(...):
    def __init__(self, config, ...):
        self.bucket = config['sim.sync.s3.bucket']  # user must provide bucket
        self.prefix = config.setdefault('sim.sync.s3.prefix', '/')
        ...

if len(configs) == 0:
raise ValueError('No factor at index {}'.format(only_factor))
assert len(configs) == 1, 'exactly one config expected when ' \
'--only_factor is used'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string associated with this assert presumes a --only_factor command line option, but that's not a valid assumption.

desmod/config.py Outdated
:param int only_factor:
When specified, only a single config will be yielded; the config will
contain only the single factor at the specified index of the list of
the cartesian product of the unrolled factors.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remain against plumbing this only_factor kwarg into factorial_config(). We can isolate this capability to simulate_factors(). I know that imposes a somewhat larger runtime cost, but I see that cost as marginal and the conceptual complexity of spreading this feature across two modules as the more important concern.

@@ -205,7 +220,7 @@ def simulate(config, top_type, env_type=SimEnvironment, reraise=True,


def simulate_factors(base_config, factors, top_type,
env_type=SimEnvironment, jobs=None):
env_type=SimEnvironment, jobs=None, only_factor=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if instead of having this new only_factor argument, we instead had a filter function? I.e. the user would pass in a function that evaluated whether each config was either included or not. E.g.:

simulate_factors(..., config_filter=lambda config: config['meta.sim.index'] == 42)

This would make this feature more general. I could, for example, envision this being useful for at least two other applications:

  • Selecting more than one index: config_filter=lambda c: c['meta.sim.index'] in [42, 142].
  • Sampling configs from a very large configuration space: config_filter: lambda c: random.random() < 0.1. I.e. pick a random 10% of the factorial configs.

I think it would be more reasonable to push this more general capability into factorial_config() than the specific only_factor capability. Unfortunately, I don't think it would afford the same performance optimization that plumbing only_factor affords.

N.B. if we pushed this filter function capability down to factorial_config(), that function would have to take responsibility for assigning meta.sim.index. We currently do that assignment in simulate_factors().

* only_factor -> config_filter
* use setdefault() for some workspace sync config keys
* default s3 sync prefix is now ""
* don't importa boto3 or concurrent.futures until needed.
* sim.workspace.s3_sync -> sim.sync.s3.enable (and similar for other s3
  sync-related keys)
@@ -5,6 +5,12 @@
from multiprocessing import cpu_count, Process, Queue
from pprint import pprint
from threading import Thread
try:
from future_builtins import filter
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use six for Python 2/3 compatibility. To get filter, we can unconditionally:

from six.moves import filter

desmod/config.py Outdated
@@ -271,7 +271,7 @@ def factorial_config(base_config, factors, special_key=None):
for keys, values_list in factors:
unrolled_factors.append([(keys, values) for values in values_list])

for keys_values_lists in product(*unrolled_factors):
for idx, keys_values_lists in enumerate(product(*unrolled_factors)):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove enumerate() to completely unwind the changes in this function.

os.chdir(self.config['sim.workspace'])
self.sync()
finally:
os.chdir(self.orig_dir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When __exit__() is called, we have a strong expectation that the current directory will be meta.sim.workspace. In multi-simulation cases, that would be something like ws/42. The new code here changes to the sim.workspace directory prior to calling sync(); i.e. just ws, not ws/42 where this simulation's artifacts reside.

I question whether it is right for sync() to operate outside of the individual simulation's workspace?

It seems like this paradigm of changing to the multi-sim session level directory (sim.workspace) can only work correctly if the new config_filter is used to limit the multi-simulation session to exactly one simulation. Otherwise, each of many simulations would call sync() such that they would all try to sync ws/0, ws/1, ..., ws/N.

To put a fine point on it, what I'm suggesting is that the control-flow here should be:

def __exit__(...):
    self.sync()
    os.chdir(self.prev_dir)

Sorry I didn't pick up on this in my first review.

@@ -124,11 +131,12 @@ def schedule(self, delay=0):
class _Workspace(object):
"""Context manager for workspace directory management."""
def __init__(self, config):
self.config = config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I maintain that it is important to fully extract all configuration from config here in _Workspace.__init__(). Doing so would obviate the need to capture config with the self.config instance attribute. It would also demand that we instead instantiate an S3Sync instance. I.e.:

def __init__(self, config):
    self.s3_sync = S3Sync(config)
    ...
def sync(self):
    if self.s3_sync.is_enabled:
        ...
        self.s3_sync.sync(artifacts)

We also want S3Sync.__init__() to follow suit and extract all needed config as instance attributes such that the config dict is not retained as an instance attribute and is thus not referenced in S3Sync.sync().

artifact, self.config['sim.sync.s3.bucket'], dest)

def sync(self):
"""Concurrently upload the artifacts to s3."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A potential problem with sync() is that it does not sync--it only uploads new files.

I assume client.upload_file() will happily overwrite files without error.

What is missing is any concept of removing files that exist in S3, but are not in the list of artifacts to synchronize. We would ostensibly need to interrogate S3 with list_objects() to figure out which extra files (objects) need to be deleted.

However, doing that alone would leave extra per-simulation directories unaccounted for. E.g. using the same workspace, if I first run a multi-factor set of 10 simulations and then run again with only 5 simulations, even if we synchronized per-simulation directories 0..4 correclty, S3 would still contain bogus/stale directories 5..9.

@@ -37,6 +37,7 @@ def config():
'sim.vcd.start_time': '',
'sim.vcd.stop_time': '',
'sim.workspace': 'workspace',
'sim.sync.s3.enable': False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like since sim.sync.s3.enable defaults to False, we should not have had to add this here. Is it really needed?

The sync code will not live in desmod.  We've decided to take a
different approach.
@coveralls
Copy link

coveralls commented May 5, 2017

Coverage Status

Coverage increased (+0.03%) to 95.914% when pulling aa35322 on bgmerrell:master into 84edba4 on SanDisk-Open-Source:master.

@jpgrayson jpgrayson merged commit 3bd2ea2 into westerndigitalcorporation:master May 5, 2017
@bgmerrell bgmerrell changed the title Optionally sync workspaces to s3 Add config filter parameter to simulate_factors() May 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants